<!DOCTYPE html>
<html >

<head>

  <meta charset="UTF-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <title>《属性数据分析》代码</title>
  <meta name="description" content="《属性数据分析》代码。">
  <meta name="generator" content="bookdown 0.7 and GitBook 2.6.7">

  <meta property="og:title" content="《属性数据分析》代码" />
  <meta property="og:type" content="book" />
  
  
  <meta property="og:description" content="《属性数据分析》代码。" />
  

  <meta name="twitter:card" content="summary" />
  <meta name="twitter:title" content="《属性数据分析》代码" />
  
  <meta name="twitter:description" content="《属性数据分析》代码。" />
  



<meta name="date" content="2018-09-05">

  <meta name="viewport" content="width=device-width, initial-scale=1">
  <meta name="apple-mobile-web-app-capable" content="yes">
  <meta name="apple-mobile-web-app-status-bar-style" content="black">
  
  
<link rel="prev" href="glm.html">
<link rel="next" href="build-and-apply-logistic-model.html">
<script src="libs/jquery/jquery.min.js"></script>
<link href="libs/gitbook/css/style.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-bookdown.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-highlight.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-search.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-fontsettings.css" rel="stylesheet" />









<style type="text/css">
a.sourceLine { display: inline-block; line-height: 1.25; }
a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
a.sourceLine:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode { white-space: pre; position: relative; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
code.sourceCode { white-space: pre-wrap; }
a.sourceLine { text-indent: -1em; padding-left: 1em; }
}
pre.numberSource a.sourceLine
  { position: relative; left: -4em; }
pre.numberSource a.sourceLine::before
  { content: attr(data-line-number);
    position: relative; left: -1em; text-align: right; vertical-align: baseline;
    border: none; pointer-events: all; display: inline-block;
    -webkit-touch-callout: none; -webkit-user-select: none;
    -khtml-user-select: none; -moz-user-select: none;
    -ms-user-select: none; user-select: none;
    padding: 0 4px; width: 4em;
    color: #aaaaaa;
  }
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
div.sourceCode
  {  }
@media screen {
a.sourceLine::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
</style>

<link rel="stylesheet" href="css/style.css" type="text/css" />
</head>

<body>



  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">

    <div class="book-summary">
      <nav role="navigation">

<ul class="summary">
<li><a href="./index.html">《属性数据分析》代码</a></li>

<li class="divider"></li>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i>前言</a></li>
<li class="chapter" data-level="1" data-path="intro.html"><a href="intro.html"><i class="fa fa-check"></i><b>1</b> 导言</a><ul>
<li class="chapter" data-level="1.1" data-path="intro.html"><a href="intro.html#data-intro"><i class="fa fa-check"></i><b>1.1</b> 属性响应数据</a></li>
<li class="chapter" data-level="1.2" data-path="intro.html"><a href="intro.html#prob-dist"><i class="fa fa-check"></i><b>1.2</b> 属性数据的概率分布</a><ul>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布计算"><i class="fa fa-check"></i>二项分布计算</a></li>
</ul></li>
<li class="chapter" data-level="1.3" data-path="intro.html"><a href="intro.html#stat-infer"><i class="fa fa-check"></i><b>1.3</b> 比例的统计推断</a><ul>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布似然函数图"><i class="fa fa-check"></i>二项分布似然函数图</a></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布假设检验"><i class="fa fa-check"></i>二项分布假设检验</a></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布置信区间"><i class="fa fa-check"></i>二项分布置信区间</a></li>
</ul></li>
<li class="chapter" data-level="1.4" data-path="intro.html"><a href="intro.html#more-stat-infer"><i class="fa fa-check"></i><b>1.4</b> 关于离散数据的更多统计推断</a><ul>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#二项分布参数统计推断"><i class="fa fa-check"></i>二项分布参数统计推断</a></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#小样本推断"><i class="fa fa-check"></i>小样本推断</a></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#小样本推断p值调整"><i class="fa fa-check"></i>小样本推断P值调整</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#problems-ch1"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="intro.html"><a href="intro.html#第4题"><i class="fa fa-check"></i>第4题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="2" data-path="contingency-table.html"><a href="contingency-table.html"><i class="fa fa-check"></i><b>2</b> 列联表</a><ul>
<li class="chapter" data-level="2.1" data-path="contingency-table.html"><a href="contingency-table.html#stucture"><i class="fa fa-check"></i><b>2.1</b> 列联表的概率结构</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#关于来世"><i class="fa fa-check"></i>关于来世</a></li>
</ul></li>
<li class="chapter" data-level="2.2" data-path="contingency-table.html"><a href="contingency-table.html#prop-compare"><i class="fa fa-check"></i><b>2.2</b> 2×2表比例的比较</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#阿司匹林与心脏病列联表检验"><i class="fa fa-check"></i>阿司匹林与心脏病（列联表检验）</a></li>
</ul></li>
<li class="chapter" data-level="2.3" data-path="contingency-table.html"><a href="contingency-table.html#odds-ratio"><i class="fa fa-check"></i><b>2.3</b> 优势比</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#阿司匹林与心脏病优势比"><i class="fa fa-check"></i>阿司匹林与心脏病（优势比）</a></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#吸烟状态与心肌梗死"><i class="fa fa-check"></i>吸烟状态与心肌梗死</a></li>
</ul></li>
<li class="chapter" data-level="2.4" data-path="contingency-table.html"><a href="contingency-table.html#chi-square-test"><i class="fa fa-check"></i><b>2.4</b> 独立性的卡方检验</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#性别和党派认同"><i class="fa fa-check"></i>性别和党派认同</a></li>
</ul></li>
<li class="chapter" data-level="2.5" data-path="contingency-table.html"><a href="contingency-table.html#indenpendence-test-for-ordinal-data"><i class="fa fa-check"></i><b>2.5</b> 有序数据的独立性检验</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#饮酒与婴儿畸形"><i class="fa fa-check"></i>饮酒与婴儿畸形</a></li>
</ul></li>
<li class="chapter" data-level="2.6" data-path="contingency-table.html"><a href="contingency-table.html#exact-test-for-small-sample"><i class="fa fa-check"></i><b>2.6</b> 小样本的精确推断</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#女士品茶"><i class="fa fa-check"></i>女士品茶</a></li>
</ul></li>
<li class="chapter" data-level="2.7" data-path="contingency-table.html"><a href="contingency-table.html#three-way-table"><i class="fa fa-check"></i><b>2.7</b> 三项列联表的关联性</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#死刑判决案例"><i class="fa fa-check"></i>死刑判决案例</a></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#临床试验"><i class="fa fa-check"></i>临床试验</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#problems-ch2"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#第18题"><i class="fa fa-check"></i>第18题</a></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#第22题"><i class="fa fa-check"></i>第22题</a></li>
<li class="chapter" data-level="" data-path="contingency-table.html"><a href="contingency-table.html#第33题"><i class="fa fa-check"></i>第33题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="3" data-path="glm.html"><a href="glm.html"><i class="fa fa-check"></i><b>3</b> 广义线性模型</a><ul>
<li class="chapter" data-level="3.1" data-path="glm.html"><a href="glm.html#components-of-glm"><i class="fa fa-check"></i><b>3.1</b> 广义线性模型的构成部分</a></li>
<li class="chapter" data-level="3.2" data-path="glm.html"><a href="glm.html#glm-for-binary-data"><i class="fa fa-check"></i><b>3.2</b> 二分数据的广义线性模型</a><ul>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#打鼾与心脏病"><i class="fa fa-check"></i>打鼾与心脏病</a></li>
</ul></li>
<li class="chapter" data-level="3.3" data-path="glm.html"><a href="glm.html#glm-for-count-data"><i class="fa fa-check"></i><b>3.3</b> 计数数据的广义线性模型</a><ul>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#母鲎及其追随者泊松glm"><i class="fa fa-check"></i>母鲎及其追随者（泊松GLM）</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#母鲎及其追随者负二项glm"><i class="fa fa-check"></i>母鲎及其追随者（负二项GLM）</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#英国的火车事故"><i class="fa fa-check"></i>英国的火车事故</a></li>
</ul></li>
<li class="chapter" data-level="3.4" data-path="glm.html"><a href="glm.html#stat-infer-glm"><i class="fa fa-check"></i><b>3.4</b> 统计推断和模型检验</a><ul>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#打鼾与心脏病-1"><i class="fa fa-check"></i>打鼾与心脏病</a></li>
</ul></li>
<li class="chapter" data-level="3.5" data-path="glm.html"><a href="glm.html#fit-glm"><i class="fa fa-check"></i><b>3.5</b> 广义线性模型的拟合</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#problems-ch3"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#第3题"><i class="fa fa-check"></i>第3题</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#第4题-1"><i class="fa fa-check"></i>第4题</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#第7题"><i class="fa fa-check"></i>第7题</a></li>
<li class="chapter" data-level="" data-path="glm.html"><a href="glm.html#第20题"><i class="fa fa-check"></i>第20题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="4" data-path="logistic-regression.html"><a href="logistic-regression.html"><i class="fa fa-check"></i><b>4</b> logistic回归</a><ul>
<li class="chapter" data-level="4.1" data-path="logistic-regression.html"><a href="logistic-regression.html#interpret-logistic"><i class="fa fa-check"></i><b>4.1</b> logistic回归模型的解释</a><ul>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#母鲎及其追随者logistic回归"><i class="fa fa-check"></i>母鲎及其追随者（logistic回归）</a></li>
</ul></li>
<li class="chapter" data-level="4.2" data-path="logistic-regression.html"><a href="logistic-regression.html#infer-logistic"><i class="fa fa-check"></i><b>4.2</b> logistic回归的推断</a></li>
<li class="chapter" data-level="4.3" data-path="logistic-regression.html"><a href="logistic-regression.html#cate-var-logistic"><i class="fa fa-check"></i><b>4.3</b> 属性预测变量的logistic回归</a><ul>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#azt和aids"><i class="fa fa-check"></i>AZT和AIDS</a></li>
</ul></li>
<li class="chapter" data-level="4.4" data-path="logistic-regression.html"><a href="logistic-regression.html#multi-logistic"><i class="fa fa-check"></i><b>4.4</b> 多元logistic回归</a><ul>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#母鲎及其追随者多元logistic"><i class="fa fa-check"></i>母鲎及其追随者（多元logistic）</a></li>
</ul></li>
<li class="chapter" data-level="4.5" data-path="logistic-regression.html"><a href="logistic-regression.html#logistic回归效应的概括"><i class="fa fa-check"></i><b>4.5</b> logistic回归效应的概括</a></li>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#problem-ch4"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#第8题"><i class="fa fa-check"></i>第8题</a></li>
<li class="chapter" data-level="" data-path="logistic-regression.html"><a href="logistic-regression.html#第24题"><i class="fa fa-check"></i>第24题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="5" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html"><i class="fa fa-check"></i><b>5</b> logistic回归模型的构建和应用</a><ul>
<li class="chapter" data-level="5.1" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#model-selection"><i class="fa fa-check"></i><b>5.1</b> 模型选择策略</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#母鲎及其追随者模型选择"><i class="fa fa-check"></i>母鲎及其追随者（模型选择）</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#母鲎及其追随者预测功效"><i class="fa fa-check"></i>母鲎及其追随者（预测功效）</a></li>
</ul></li>
<li class="chapter" data-level="5.2" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#model-checking"><i class="fa fa-check"></i><b>5.2</b> 模型检验</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#母鲎及其追随者模型lr检验"><i class="fa fa-check"></i>母鲎及其追随者（模型LR检验）</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#azt和aids拟合优度"><i class="fa fa-check"></i>AZT和AIDS（拟合优度）</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#母鲎及其追随者hm检验"><i class="fa fa-check"></i>母鲎及其追随者（HM检验）</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#佛罗里达大学研究生入学"><i class="fa fa-check"></i>佛罗里达大学研究生入学</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#心脏病与血压的关系"><i class="fa fa-check"></i>心脏病与血压的关系</a></li>
</ul></li>
<li class="chapter" data-level="5.3" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#sparse-data-logistic"><i class="fa fa-check"></i><b>5.3</b> 稀疏数据效应</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#稀疏数据的临床试验结果"><i class="fa fa-check"></i>稀疏数据的临床试验结果</a></li>
</ul></li>
<li class="chapter" data-level="5.4" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#conditional-logistic"><i class="fa fa-check"></i><b>5.4</b> 条件logistic回归与精确推断</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#晋升能力"><i class="fa fa-check"></i>晋升能力</a></li>
</ul></li>
<li class="chapter" data-level="5.5" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#logistic-sample-num"><i class="fa fa-check"></i><b>5.5</b> logistic回归的样本量与功效</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#样本量计算"><i class="fa fa-check"></i>样本量计算</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#problem-ch5"><i class="fa fa-check"></i>课后题</a><ul>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#第10题"><i class="fa fa-check"></i>第10题</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#第18题-1"><i class="fa fa-check"></i>第18题</a></li>
<li class="chapter" data-level="" data-path="build-and-apply-logistic-model.html"><a href="build-and-apply-logistic-model.html#第28题"><i class="fa fa-check"></i>第28题</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="6" data-path="multi-logit-model.html"><a href="multi-logit-model.html"><i class="fa fa-check"></i><b>6</b> 多类别logit模型</a><ul>
<li class="chapter" data-level="6.1" data-path="multi-logit-model.html"><a href="multi-logit-model.html#nomial-logit"><i class="fa fa-check"></i><b>6.1</b> 名义响应变量的logit模型</a><ul>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#钝吻鳄食物选择"><i class="fa fa-check"></i>钝吻鳄食物选择</a></li>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#是否相信来世"><i class="fa fa-check"></i>是否相信来世</a></li>
</ul></li>
<li class="chapter" data-level="6.2" data-path="multi-logit-model.html"><a href="multi-logit-model.html#ordinal-logit"><i class="fa fa-check"></i><b>6.2</b> 有序响应变量的累积logit模型</a><ul>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#政治意识形态和隶属党派的关系"><i class="fa fa-check"></i>政治意识形态和隶属党派的关系</a></li>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#对心理健康建模"><i class="fa fa-check"></i>对心理健康建模</a></li>
</ul></li>
<li class="chapter" data-level="6.3" data-path="multi-logit-model.html"><a href="multi-logit-model.html#paired-ordinal-logit"><i class="fa fa-check"></i><b>6.3</b> 成对类别有序logit</a><ul>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#再访政治意识形态"><i class="fa fa-check"></i>再访政治意识形态</a></li>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#发育毒性研究"><i class="fa fa-check"></i>发育毒性研究</a></li>
</ul></li>
<li class="chapter" data-level="6.4" data-path="multi-logit-model.html"><a href="multi-logit-model.html#conditional-independent"><i class="fa fa-check"></i><b>6.4</b> 条件独立性检验</a><ul>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#工作满意度和收入"><i class="fa fa-check"></i>工作满意度和收入</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="multi-logit-model.html"><a href="multi-logit-model.html#ch6-problems"><i class="fa fa-check"></i>课后题</a></li>
</ul></li>
<li class="appendix"><span><b>附录</b></span></li>
<li class="chapter" data-level="A" data-path="r-pkg-intro.html"><a href="r-pkg-intro.html"><i class="fa fa-check"></i><b>A</b> 配套R包使用介绍</a><ul>
<li class="chapter" data-level="A.1" data-path="r-pkg-intro.html"><a href="r-pkg-intro.html#r-pkg-install"><i class="fa fa-check"></i><b>A.1</b> 安装</a></li>
<li class="chapter" data-level="A.2" data-path="r-pkg-intro.html"><a href="r-pkg-intro.html#r-pkg-use"><i class="fa fa-check"></i><b>A.2</b> 使用说明</a></li>
</ul></li>
<li class="chapter" data-level="B" data-path="book-dataset-list.html"><a href="book-dataset-list.html"><i class="fa fa-check"></i><b>B</b> 教材数据列表</a><ul>
<li class="chapter" data-level="B.1" data-path="book-dataset-list.html"><a href="book-dataset-list.html#正文案例数据"><i class="fa fa-check"></i><b>B.1</b> 正文案例数据</a></li>
<li class="chapter" data-level="B.2" data-path="book-dataset-list.html"><a href="book-dataset-list.html#习题数据"><i class="fa fa-check"></i><b>B.2</b> 习题数据</a></li>
</ul></li>
<li class="divider"></li>
<li><a href="https://bookdown.org" target="blank">本书由 bookdown 强力驱动</a></li>

</ul>

      </nav>
    </div>

    <div class="book-body">
      <div class="body-inner">
        <div class="book-header" role="navigation">
          <h1>
            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">《属性数据分析》代码</a>
          </h1>
        </div>

        <div class="page-wrapper" tabindex="-1" role="main">
          <div class="page-inner">

            <section class="normal" id="section-">
<div id="logistic-regression" class="section level1">
<h1><span class="header-section-number">第 4 章</span> logistic回归</h1>
<div id="interpret-logistic" class="section level2">
<h2><span class="header-section-number">4.1</span> logistic回归模型的解释</h2>
<div id="母鲎及其追随者logistic回归" class="section level3 unnumbered">
<h3>母鲎及其追随者（logistic回归）</h3>
<div class="sourceCode" id="cb233"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb233-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb233-2" data-line-number="2"><span class="kw">data</span>(<span class="st">&quot;horseshoecrabs&quot;</span>)</a>
<a class="sourceLine" id="cb233-3" data-line-number="3">horseshoecrabs<span class="op">$</span>psat &lt;-<span class="st"> </span><span class="kw">as.integer</span>(horseshoecrabs<span class="op">$</span>Satellites <span class="op">&gt;</span><span class="st"> </span><span class="dv">0</span>)</a>
<a class="sourceLine" id="cb233-4" data-line-number="4"></a>
<a class="sourceLine" id="cb233-5" data-line-number="5">m1 &lt;-<span class="st"> </span><span class="kw">glm</span>(psat <span class="op">~</span><span class="st"> </span>Width, <span class="dt">data =</span> horseshoecrabs, <span class="dt">family =</span> <span class="kw">binomial</span>())</a>
<a class="sourceLine" id="cb233-6" data-line-number="6"><span class="kw">summary</span>(m1)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = psat ~ Width, family = binomial(), data = horseshoecrabs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.028  -1.046   0.548   0.907   1.694  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)  -12.351      2.629   -4.70  2.6e-06 ***
## Width          0.497      0.102    4.89  1.0e-06 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 194.45  on 171  degrees of freedom
## AIC: 198.5
## 
## Number of Fisher Scoring iterations: 4</code></pre>
<p>该模型的详细解释可从教材中得到，以下是教材中评价模型拟合情况的图（图4.3）的一种作图方法。</p>
<div class="sourceCode" id="cb235"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb235-1" data-line-number="1"><span class="kw">library</span>(dplyr)</a>
<a class="sourceLine" id="cb235-2" data-line-number="2"><span class="co"># 需要先按宽度分组，再求各组的平均宽度和平均追随者只数</span></a>
<a class="sourceLine" id="cb235-3" data-line-number="3">mean_width_vs_prop &lt;-<span class="st"> </span>horseshoecrabs <span class="op">%&gt;%</span><span class="st"> </span></a>
<a class="sourceLine" id="cb235-4" data-line-number="4"><span class="st">  </span><span class="kw">mutate</span>(<span class="dt">width_group =</span> <span class="kw">cut</span>(Width, <span class="kw">c</span>(<span class="dv">0</span>, <span class="fl">23.25</span> <span class="op">+</span><span class="st"> </span><span class="dv">0</span><span class="op">:</span><span class="dv">6</span>, <span class="ot">Inf</span>), <span class="dt">dig.lab =</span> <span class="dv">4</span>)) <span class="op">%&gt;%</span><span class="st"> </span></a>
<a class="sourceLine" id="cb235-5" data-line-number="5"><span class="st">  </span><span class="kw">group_by</span>(width_group) <span class="op">%&gt;%</span><span class="st"> </span><span class="co"># 声明按width_group进行分组</span></a>
<a class="sourceLine" id="cb235-6" data-line-number="6"><span class="st">  </span><span class="kw">summarise</span>(</a>
<a class="sourceLine" id="cb235-7" data-line-number="7">    <span class="dt">prop =</span> <span class="kw">mean</span>(psat),  <span class="co"># 各分组具有追随者的比例</span></a>
<a class="sourceLine" id="cb235-8" data-line-number="8">    <span class="dt">mean_width =</span> <span class="kw">mean</span>(Width)  <span class="co"># 平均宽度</span></a>
<a class="sourceLine" id="cb235-9" data-line-number="9">  )</a>
<a class="sourceLine" id="cb235-10" data-line-number="10"></a>
<a class="sourceLine" id="cb235-11" data-line-number="11">prop &lt;-<span class="st"> </span>mean_width_vs_prop<span class="op">$</span>prop  <span class="co"># 各个分组下具有追随者的比例</span></a>
<a class="sourceLine" id="cb235-12" data-line-number="12">mean_width &lt;-<span class="st"> </span>mean_width_vs_prop<span class="op">$</span>mean_width  <span class="co"># 各个分组的平均宽度</span></a>
<a class="sourceLine" id="cb235-13" data-line-number="13"></a>
<a class="sourceLine" id="cb235-14" data-line-number="14"><span class="co"># 计算各个分组的平均宽度下的预测概率</span></a>
<a class="sourceLine" id="cb235-15" data-line-number="15">pred_prop &lt;-<span class="st"> </span><span class="kw">predict</span>(</a>
<a class="sourceLine" id="cb235-16" data-line-number="16">  m1, <span class="kw">data.frame</span>(<span class="dt">Width =</span> mean_width), <span class="dt">type =</span> <span class="st">&quot;response&quot;</span></a>
<a class="sourceLine" id="cb235-17" data-line-number="17">)</a>
<a class="sourceLine" id="cb235-18" data-line-number="18"></a>
<a class="sourceLine" id="cb235-19" data-line-number="19"><span class="co"># 绘制拟合曲线的数据</span></a>
<a class="sourceLine" id="cb235-20" data-line-number="20">width_seq &lt;-<span class="st"> </span><span class="kw">seq</span>(<span class="dv">21</span>, <span class="dv">33</span>, <span class="fl">0.1</span>)</a>
<a class="sourceLine" id="cb235-21" data-line-number="21">pred_prop_seq &lt;-<span class="st"> </span><span class="kw">predict</span>(</a>
<a class="sourceLine" id="cb235-22" data-line-number="22">  m1, <span class="kw">data.frame</span>(<span class="dt">Width =</span> width_seq), <span class="dt">type =</span> <span class="st">&quot;response&quot;</span></a>
<a class="sourceLine" id="cb235-23" data-line-number="23">)</a>
<a class="sourceLine" id="cb235-24" data-line-number="24">  </a>
<a class="sourceLine" id="cb235-25" data-line-number="25">  </a>
<a class="sourceLine" id="cb235-26" data-line-number="26"><span class="kw">plot</span>(</a>
<a class="sourceLine" id="cb235-27" data-line-number="27">  prop <span class="op">~</span><span class="st"> </span>mean_width, <span class="dt">pch =</span> <span class="dv">20</span>,  <span class="co"># 点类型为实心圆点</span></a>
<a class="sourceLine" id="cb235-28" data-line-number="28">  <span class="dt">xlim =</span> <span class="kw">c</span>(<span class="dv">22</span>, <span class="dv">32</span>), <span class="dt">ylim =</span> <span class="kw">c</span>(<span class="dv">0</span>, <span class="dv">1</span>),  <span class="co"># 横纵坐标范围</span></a>
<a class="sourceLine" id="cb235-29" data-line-number="29">  <span class="dt">xlab =</span> <span class="st">&quot;Width&quot;</span>, <span class="dt">ylab =</span> <span class="st">&quot;Proportion Having Satellites&quot;</span>  <span class="co"># 横纵坐标标签</span></a>
<a class="sourceLine" id="cb235-30" data-line-number="30">)</a>
<a class="sourceLine" id="cb235-31" data-line-number="31"></a>
<a class="sourceLine" id="cb235-32" data-line-number="32"><span class="kw">points</span>(mean_width, pred_prop, <span class="dt">pch =</span> <span class="dv">3</span>)  <span class="co"># 点类型为加号</span></a>
<a class="sourceLine" id="cb235-33" data-line-number="33"><span class="kw">points</span>(width_seq, pred_prop_seq, <span class="dt">type =</span> <span class="st">&quot;l&quot;</span>, <span class="dt">lty =</span> <span class="dv">2</span>)  <span class="co"># 类型为线，线类型为虚线</span></a>
<a class="sourceLine" id="cb235-34" data-line-number="34"><span class="kw">legend</span>(<span class="fl">28.5</span>, <span class="fl">0.2</span>, <span class="kw">c</span>(<span class="st">&quot;observed&quot;</span>, <span class="st">&quot;fitted&quot;</span>), <span class="dt">pch =</span> <span class="kw">c</span>(<span class="dv">20</span>, <span class="dv">3</span>))  <span class="co"># 图例</span></a></code></pre></div>
<p><img src="cdacode_files/figure-html/unnamed-chunk-92-1.png" width="70%" style="display: block; margin: auto;" /></p>
</div>
</div>
<div id="infer-logistic" class="section level2">
<h2><span class="header-section-number">4.2</span> logistic回归的推断</h2>
</div>
<div id="cate-var-logistic" class="section level2">
<h2><span class="header-section-number">4.3</span> 属性预测变量的logistic回归</h2>
<div id="azt和aids" class="section level3 unnumbered">
<h3>AZT和AIDS</h3>
<div class="sourceCode" id="cb236"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb236-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb236-2" data-line-number="2"><span class="kw">data</span>(<span class="st">&quot;AZT&quot;</span>)</a>
<a class="sourceLine" id="cb236-3" data-line-number="3">AZT0 &lt;-<span class="st"> </span><span class="kw">as.data.frame</span>(AZT)</a>
<a class="sourceLine" id="cb236-4" data-line-number="4"><span class="co"># 构造因变量</span></a>
<a class="sourceLine" id="cb236-5" data-line-number="5">AZT0<span class="op">$</span>y &lt;-<span class="st"> </span>AZT0<span class="op">$</span>Symptoms <span class="op">==</span><span class="st"> &quot;Yes&quot;</span></a>
<a class="sourceLine" id="cb236-6" data-line-number="6"><span class="co">#拟合模型</span></a>
<a class="sourceLine" id="cb236-7" data-line-number="7">AZT.glm &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb236-8" data-line-number="8">  y <span class="op">~</span><span class="st"> </span>(AZTUse <span class="op">==</span><span class="st"> &quot;Yes&quot;</span>) <span class="op">+</span><span class="st"> </span>(Race <span class="op">==</span><span class="st"> &quot;White&quot;</span>),</a>
<a class="sourceLine" id="cb236-9" data-line-number="9">  <span class="dt">data =</span> AZT0,</a>
<a class="sourceLine" id="cb236-10" data-line-number="10">  <span class="dt">weights =</span> Freq,</a>
<a class="sourceLine" id="cb236-11" data-line-number="11">  <span class="dt">family =</span> <span class="kw">binomial</span>()</a>
<a class="sourceLine" id="cb236-12" data-line-number="12">)</a>
<a class="sourceLine" id="cb236-13" data-line-number="13"><span class="kw">summary</span>(AZT.glm)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = y ~ (AZTUse == &quot;Yes&quot;) + (Race == &quot;White&quot;), family = binomial(), 
##     data = AZT0, weights = Freq)
## 
## Deviance Residuals: 
##     1      2      3      4      5      6      7      8  
##  7.29   6.54   9.21   5.73  -5.49  -4.00  -7.07  -5.03  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(&gt;|z|)
## (Intercept)          -1.0736     0.2629   -4.08  4.4e-05
## AZTUse == &quot;Yes&quot;TRUE  -0.7195     0.2790   -2.58   0.0099
## Race == &quot;White&quot;TRUE   0.0555     0.2886    0.19   0.8475
##                        
## (Intercept)         ***
## AZTUse == &quot;Yes&quot;TRUE ** 
## Race == &quot;White&quot;TRUE    
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 342.12  on 7  degrees of freedom
## Residual deviance: 335.15  on 5  degrees of freedom
## AIC: 341.2
## 
## Number of Fisher Scoring iterations: 5</code></pre>
<div class="sourceCode" id="cb238"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb238-1" data-line-number="1"><span class="co"># LR检验</span></a>
<a class="sourceLine" id="cb238-2" data-line-number="2"><span class="kw">anova</span>(AZT.glm, <span class="dt">test=</span><span class="st">&quot;LRT&quot;</span>)</a></code></pre></div>
<pre><code>## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: y
## 
## Terms added sequentially (first to last)
## 
## 
##                 Df Deviance Resid. Df Resid. Dev Pr(&gt;Chi)
## NULL                                7        342         
## AZTUse == &quot;Yes&quot;  1     6.93         6        335   0.0085
## Race == &quot;White&quot;  1     0.04         5        335   0.8473
##                   
## NULL              
## AZTUse == &quot;Yes&quot; **
## Race == &quot;White&quot;   
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1</code></pre>
</div>
</div>
<div id="multi-logistic" class="section level2">
<h2><span class="header-section-number">4.4</span> 多元logistic回归</h2>
<div id="母鲎及其追随者多元logistic" class="section level3 unnumbered">
<h3>母鲎及其追随者（多元logistic）</h3>
<p>在从<code>cdabookcode</code>包中将数据引入之后，由于<code>Color</code>列为数值类型，需要先转换为因子类型。此外，在回归中使用因子型变量时，R会将因子水平的第一个作为基准类型，以下示例中为了与教材结果一致将使用颜色4作为基准类型。</p>
<div class="sourceCode" id="cb240"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb240-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb240-2" data-line-number="2"><span class="kw">library</span>(dplyr)</a>
<a class="sourceLine" id="cb240-3" data-line-number="3"><span class="kw">data</span>(<span class="st">&quot;horseshoecrabs&quot;</span>)</a>
<a class="sourceLine" id="cb240-4" data-line-number="4">horseshoecrabs &lt;-<span class="st"> </span>horseshoecrabs <span class="op">%&gt;%</span><span class="st"> </span></a>
<a class="sourceLine" id="cb240-5" data-line-number="5"><span class="st">  </span><span class="kw">mutate</span>(</a>
<a class="sourceLine" id="cb240-6" data-line-number="6">    <span class="dt">Color_factor =</span> <span class="kw">factor</span>(Color, <span class="dv">4</span><span class="op">:</span><span class="dv">1</span>),  <span class="co"># 将Color转换为因子，并设置因子水平</span></a>
<a class="sourceLine" id="cb240-7" data-line-number="7">    <span class="dt">psat =</span> <span class="kw">as.integer</span>(horseshoecrabs<span class="op">$</span>Satellites <span class="op">&gt;</span><span class="st"> </span><span class="dv">0</span>)  <span class="co"># psat为是否有追随者</span></a>
<a class="sourceLine" id="cb240-8" data-line-number="8">  )</a>
<a class="sourceLine" id="cb240-9" data-line-number="9"></a>
<a class="sourceLine" id="cb240-10" data-line-number="10"></a>
<a class="sourceLine" id="cb240-11" data-line-number="11">m1 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb240-12" data-line-number="12">  psat <span class="op">~</span><span class="st"> </span>Width <span class="op">+</span><span class="st"> </span>Color_factor, <span class="dt">data =</span> horseshoecrabs, <span class="dt">family =</span> <span class="kw">binomial</span>()</a>
<a class="sourceLine" id="cb240-13" data-line-number="13">)</a>
<a class="sourceLine" id="cb240-14" data-line-number="14"><span class="kw">summary</span>(m1)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = psat ~ Width + Color_factor, family = binomial(), 
##     data = horseshoecrabs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.112  -0.985   0.524   0.851   2.141  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)    -12.715      2.762   -4.60  4.1e-06 ***
## Width            0.468      0.106    4.43  9.3e-06 ***
## Color_factor3    1.106      0.592    1.87    0.062 .  
## Color_factor2    1.402      0.548    2.56    0.011 *  
## Color_factor1    1.330      0.853    1.56    0.119    
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 187.46  on 168  degrees of freedom
## AIC: 197.5
## 
## Number of Fisher Scoring iterations: 4</code></pre>
<p>模型的详细解释可从教材中得到。以下画出了四种颜色下预测概率与宽度的关系曲线（教材图4.4）</p>
<div class="sourceCode" id="cb242"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb242-1" data-line-number="1"><span class="co"># 画出空图</span></a>
<a class="sourceLine" id="cb242-2" data-line-number="2"><span class="kw">plot</span>(</a>
<a class="sourceLine" id="cb242-3" data-line-number="3">  <span class="ot">NULL</span>,  <span class="co"># 不画任何点或线，只画出一个空图，供之后添加曲线使用</span></a>
<a class="sourceLine" id="cb242-4" data-line-number="4">  <span class="dt">xlim =</span> <span class="kw">c</span>(<span class="dv">18</span>, <span class="dv">34</span>), <span class="dt">ylim =</span> <span class="kw">c</span>(<span class="dv">0</span>, <span class="dv">1</span>),  <span class="co"># 横纵坐标范围</span></a>
<a class="sourceLine" id="cb242-5" data-line-number="5">  <span class="dt">xlab =</span> <span class="st">&quot;Width&quot;</span>, <span class="dt">ylab =</span> <span class="st">&quot;Predicted Probability&quot;</span>  <span class="co"># 横纵坐标标签</span></a>
<a class="sourceLine" id="cb242-6" data-line-number="6">)</a>
<a class="sourceLine" id="cb242-7" data-line-number="7"></a>
<a class="sourceLine" id="cb242-8" data-line-number="8"><span class="kw">sapply</span>(<span class="dv">1</span><span class="op">:</span><span class="dv">4</span>, <span class="cf">function</span>(i) {</a>
<a class="sourceLine" id="cb242-9" data-line-number="9">  newdata &lt;-<span class="st"> </span><span class="kw">data.frame</span>(</a>
<a class="sourceLine" id="cb242-10" data-line-number="10">    <span class="dt">Width =</span> <span class="kw">seq</span>(<span class="dv">17</span>, <span class="dv">35</span>, <span class="fl">0.1</span>),</a>
<a class="sourceLine" id="cb242-11" data-line-number="11">    <span class="dt">Color_factor =</span> <span class="kw">as.character</span>(i)</a>
<a class="sourceLine" id="cb242-12" data-line-number="12">  )</a>
<a class="sourceLine" id="cb242-13" data-line-number="13">  pred_prop &lt;-<span class="st"> </span><span class="kw">predict</span>(m1, newdata, <span class="dt">type =</span> <span class="st">&quot;response&quot;</span>)  <span class="co"># 计算预测概率</span></a>
<a class="sourceLine" id="cb242-14" data-line-number="14">  <span class="kw">points</span>(newdata<span class="op">$</span>Width, pred_prop, <span class="dt">type =</span> <span class="st">&quot;l&quot;</span>, <span class="dt">col =</span> i)  <span class="co"># 绘制曲线</span></a>
<a class="sourceLine" id="cb242-15" data-line-number="15">})</a>
<a class="sourceLine" id="cb242-16" data-line-number="16"></a>
<a class="sourceLine" id="cb242-17" data-line-number="17"><span class="kw">legend</span>(<span class="dv">28</span>, <span class="fl">0.4</span>, <span class="dt">col =</span> <span class="dv">1</span><span class="op">:</span><span class="dv">4</span>, <span class="dt">legend =</span> <span class="kw">paste0</span>(<span class="st">&quot;Color&quot;</span>, <span class="dv">1</span><span class="op">:</span><span class="dv">4</span>), <span class="dt">lty =</span> <span class="dv">1</span>)  <span class="co"># 图例</span></a></code></pre></div>
<p><img src="cdacode_files/figure-html/unnamed-chunk-95-1.png" width="70%" style="display: block; margin: auto;" /></p>
<p>接着考虑4.4.3节中的有序预测变量的处理。此节中的案例与4.4.1节类似，但此处颜色变量不再是因子型，而是颜色得分。此处得分与数据集中一致，因此不必做额外处理，可直接回归。</p>
<div class="sourceCode" id="cb243"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb243-1" data-line-number="1">m2 &lt;-<span class="st"> </span><span class="kw">glm</span>(psat <span class="op">~</span><span class="st"> </span>Color <span class="op">+</span><span class="st"> </span>Width, <span class="dt">family =</span> <span class="kw">binomial</span>(), <span class="dt">data =</span> horseshoecrabs)</a>
<a class="sourceLine" id="cb243-2" data-line-number="2"><span class="kw">summary</span>(m2)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = psat ~ Color + Width, family = binomial(), data = horseshoecrabs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.169  -0.989   0.543   0.870   1.974  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)  -10.071      2.807   -3.59  0.00033 ***
## Color         -0.509      0.224   -2.28  0.02286 *  
## Width          0.458      0.104    4.41  1.1e-05 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 189.12  on 170  degrees of freedom
## AIC: 195.1
## 
## Number of Fisher Scoring iterations: 4</code></pre>
<p>而4.4.4节引入了交互效应。在拟合该模型前需要按教材中说明构造出一个颜色是否为深色的哑变量，之后再拟合包含交互效应的模型。</p>
<div class="sourceCode" id="cb245"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb245-1" data-line-number="1">horseshoecrabs<span class="op">$</span>is_dark &lt;-<span class="st"> </span><span class="kw">as.character</span>(horseshoecrabs<span class="op">$</span>Color <span class="op">&lt;</span><span class="st"> </span><span class="dv">4</span>)</a>
<a class="sourceLine" id="cb245-2" data-line-number="2"><span class="co"># is_dark * Width表示包含交互项以及is_dark和Width两个变量</span></a>
<a class="sourceLine" id="cb245-3" data-line-number="3"><span class="co"># 若只想包含交互项应使用is_dark:Width</span></a>
<a class="sourceLine" id="cb245-4" data-line-number="4">m3 &lt;-<span class="st"> </span><span class="kw">glm</span>(</a>
<a class="sourceLine" id="cb245-5" data-line-number="5">  psat <span class="op">~</span><span class="st"> </span>is_dark <span class="op">*</span><span class="st"> </span>Width,</a>
<a class="sourceLine" id="cb245-6" data-line-number="6">  <span class="dt">family =</span> <span class="kw">binomial</span>(), </a>
<a class="sourceLine" id="cb245-7" data-line-number="7">  <span class="dt">data =</span> horseshoecrabs</a>
<a class="sourceLine" id="cb245-8" data-line-number="8">)</a>
<a class="sourceLine" id="cb245-9" data-line-number="9"><span class="kw">summary</span>(m3)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = psat ~ is_dark * Width, family = binomial(), data = horseshoecrabs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.136  -0.934   0.500   0.855   1.775  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(&gt;|z|)
## (Intercept)         -5.854      6.694   -0.87     0.38
## is_darkTRUE         -6.958      7.318   -0.95     0.34
## Width                0.200      0.262    0.77     0.44
## is_darkTRUE:Width    0.322      0.286    1.13     0.26
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 186.79  on 169  degrees of freedom
## AIC: 194.8
## 
## Number of Fisher Scoring iterations: 4</code></pre>
</div>
</div>
<div id="logistic回归效应的概括" class="section level2">
<h2><span class="header-section-number">4.5</span> logistic回归效应的概括</h2>
</div>
<div id="problem-ch4" class="section level2 unnumbered">
<h2>课后题</h2>
<div id="第8题" class="section level3 unnumbered">
<h3>第8题</h3>
<ol style="list-style-type: lower-alpha">
<li></li>
</ol>
<div class="sourceCode" id="cb247"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb247-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb247-2" data-line-number="2"><span class="kw">data</span>(<span class="st">&quot;horseshoecrabs&quot;</span>)</a>
<a class="sourceLine" id="cb247-3" data-line-number="3">horseshoecrabs<span class="op">$</span>psat &lt;-<span class="st"> </span><span class="kw">as.integer</span>(horseshoecrabs<span class="op">$</span>Satellites <span class="op">&gt;</span><span class="st"> </span><span class="dv">0</span>)</a>
<a class="sourceLine" id="cb247-4" data-line-number="4">m_crab &lt;-<span class="st"> </span><span class="kw">glm</span>(psat <span class="op">~</span><span class="st"> </span>Weight, <span class="dt">data =</span> horseshoecrabs, <span class="dt">family =</span> <span class="kw">binomial</span>())</a>
<a class="sourceLine" id="cb247-5" data-line-number="5"><span class="kw">summary</span>(m_crab)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = psat ~ Weight, family = binomial(), data = horseshoecrabs)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.111  -1.075   0.543   0.912   1.629  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)   -3.695      0.880   -4.20  2.7e-05 ***
## Weight         1.815      0.377    4.82  1.4e-06 ***
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 195.74  on 171  degrees of freedom
## AIC: 199.7
## 
## Number of Fisher Scoring iterations: 4</code></pre>
<p>从而模型为<span class="math inline">\(\mathrm{logit(\pi)} = −3.6947 + 1.8151Weight\)</span></p>
<ol start="2" style="list-style-type: lower-alpha">
<li></li>
</ol>
<div class="sourceCode" id="cb249"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb249-1" data-line-number="1"><span class="kw">predict</span>(m_crab, <span class="kw">data.frame</span>(<span class="dt">Weight =</span> <span class="kw">c</span>(<span class="fl">1.2</span>, <span class="fl">2.44</span>, <span class="fl">5.2</span>)), <span class="dt">type =</span> <span class="st">&quot;response&quot;</span>)</a></code></pre></div>
<pre><code>##      1      2      3 
## 0.1800 0.6757 0.9968</code></pre>
<p>从而这三种重量的母鲎有追随者的概率分别为18.00%, 67.57%, 99.68%</p>
<ol start="3" style="list-style-type: lower-alpha">
<li></li>
</ol>
<p><span class="math display">\[\pi=\frac{e^{\alpha+\beta x}}{1+e^{\alpha+\beta x}}=0.5\]</span>
<span class="math display">\[e^{\alpha+\beta x}=1\]</span>
<span class="math display">\[\alpha+\beta x=0\]</span>
<span class="math display">\[x=-\frac{\alpha}{\beta}=2.0355\]</span></p>
<ol start="4" style="list-style-type: lower-alpha">
<li></li>
</ol>
<p>i). <span class="math inline">\(\hat{\beta}\pi(1-\pi) = 0.25 \times 1.8145 = 0.4536\)</span></p>
<p>ii). <span class="math inline">\(0.1\times 0.4536=0.0454\)</span></p>
<p>ii). <span class="math inline">\(0.58\times 0.4536=0.2631\)</span></p>
<ol start="5" style="list-style-type: lower-alpha">
<li><span class="math inline">\(\hat{\beta}\)</span>的95%置信区间为<span class="math inline">\([1.8151-1.96\times 0.3767, 1.8151+1.96\times 0.3767]=[1.0768, 2.5534]\)</span></li>
</ol>
<p>优势比的95%置信区间为 <span class="math inline">\([e^{1.0768}, e^{2.5534}]=[2.9352, 12.8511]\)</span></p>
<p>可以发现，有追随者的母鲎的重量明显比没有追随者的母鲎的重量大得多</p>
<ol start="6" style="list-style-type: lower-alpha">
<li><span class="math display">\[z^2=\Big(\frac{1.8151}{0.3767}\Big)^2=21.2172\]</span></li>
</ol>
<p>而<span class="math inline">\(df=1\)</span>，<span class="math inline">\(pvalue&lt;0.0001\)</span></p>
<p>从而确实存在重量的影响。</p>
</div>
<div id="第24题" class="section level3 unnumbered">
<h3>第24题</h3>
<ol style="list-style-type: lower-alpha">
<li></li>
</ol>
<div class="sourceCode" id="cb251"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb251-1" data-line-number="1"><span class="kw">library</span>(cdabookdb)</a>
<a class="sourceLine" id="cb251-2" data-line-number="2"><span class="kw">data</span>(<span class="st">&quot;throat&quot;</span>)</a>
<a class="sourceLine" id="cb251-3" data-line-number="3">m_throat &lt;-<span class="st"> </span><span class="kw">glm</span>(Y <span class="op">~</span><span class="st"> </span>D <span class="op">+</span><span class="st"> </span><span class="kw">factor</span>(T), <span class="dt">data =</span> throat, <span class="dt">family =</span> <span class="kw">binomial</span>())</a>
<a class="sourceLine" id="cb251-4" data-line-number="4"><span class="kw">summary</span>(m_throat)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = Y ~ D + factor(T), family = binomial(), data = throat)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.380  -0.536   0.305   0.731   1.782  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(&gt;|z|)   
## (Intercept)  -1.4173     1.0946   -1.29   0.1954   
## D             0.0687     0.0264    2.60   0.0093 **
## factor(T)1   -1.6589     0.9229   -1.80   0.0722 . 
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 46.180  on 34  degrees of freedom
## Residual deviance: 30.138  on 32  degrees of freedom
## AIC: 36.14
## 
## Number of Fisher Scoring iterations: 5</code></pre>
<p>从而模型为<span class="math display">\[\mathrm{logit}(\pi) = −1.417 + 0.069D − 1.659T\]</span></p>
<p>模型表明，当控制其他变量不变时：</p>
<ul>
<li>当D增加1时，<span class="math inline">\(Y=1\)</span>的优势会变为原来的<span class="math inline">\(e^{0.069}=1.0711\)</span>倍</li>
<li><span class="math inline">\(T=1\)</span>和<span class="math inline">\(T=0\)</span>的优势比为<span class="math inline">\(e^{-1.659}=0.1903\)</span></li>
</ul>
<ol start="2" style="list-style-type: lower-alpha">
<li><p>根据R输出结果，<span class="math inline">\(\hat{\beta}_D\)</span>的p值为<span class="math inline">\(0.009 &lt; 0.01\)</span>。所以可以认为存在D的影响。</p></li>
<li></li>
</ol>
<div class="sourceCode" id="cb253"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb253-1" data-line-number="1">m0_throat &lt;-<span class="st"> </span><span class="kw">glm</span>(Y <span class="op">~</span><span class="st"> </span>D <span class="op">*</span><span class="st"> </span><span class="kw">factor</span>(T), <span class="dt">data =</span> throat, <span class="dt">family =</span> <span class="kw">binomial</span>())</a>
<a class="sourceLine" id="cb253-2" data-line-number="2"><span class="kw">summary</span>(m0_throat)</a></code></pre></div>
<pre><code>## 
## Call:
## glm(formula = Y ~ D * factor(T), family = binomial(), data = throat)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.971  -0.378   0.345   0.729   1.996  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(&gt;|z|)  
## (Intercept)    0.0498     1.4694    0.03     0.97  
## D              0.0285     0.0343    0.83     0.41  
## factor(T)1    -4.4722     2.4671   -1.81     0.07 .
## D:factor(T)1   0.0746     0.0578    1.29     0.20  
## ---
## Signif. codes:  
## 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 46.180  on 34  degrees of freedom
## Residual deviance: 28.321  on 31  degrees of freedom
## AIC: 36.32
## 
## Number of Fisher Scoring iterations: 6</code></pre>
<p>则当<span class="math inline">\(T=1\)</span>时，<span class="math display">\[\mathrm{logit}(\pi)  = 0.0498 − 4.4722 + 0.0285D + 0.0746D = −4.4224 + 0.1031D\]</span>
当<span class="math inline">\(T=0\)</span>时，<span class="math display">\[\mathrm{logit}(\pi)  = 0.04749 + 0.0285D\]</span></p>
<p>对于<span class="math inline">\(T=1\)</span>的模型，当D增加1时，优势变为原来的<span class="math inline">\(e^{0.1031}=1.1086\)</span>倍
对于<span class="math inline">\(T=0\)</span>的模型，当D增加1时，优势变为原来的<span class="math inline">\(e^{0.0285}=1.0289\)</span>倍</p>
<ol start="4" style="list-style-type: lower-alpha">
<li></li>
</ol>
<div class="sourceCode" id="cb255"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb255-1" data-line-number="1"><span class="kw">anova</span>(m_throat, m0_throat, <span class="dt">test =</span> <span class="st">&quot;Chisq&quot;</span>)</a></code></pre></div>
<pre><code>## Analysis of Deviance Table
## 
## Model 1: Y ~ D + factor(T)
## Model 2: Y ~ D * factor(T)
##   Resid. Df Resid. Dev Df Deviance Pr(&gt;Chi)
## 1        32       30.1                     
## 2        31       28.3  1     1.82     0.18</code></pre>
<p>p值为<span class="math inline">\(0.1777&gt;0.05\)</span>，所以可以认为不需要交互项</p>

</div>
</div>
</div>
            </section>

          </div>
        </div>
      </div>
<a href="glm.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
<a href="build-and-apply-logistic-model.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
    </div>
  </div>
<script src="libs/gitbook/js/app.min.js"></script>
<script src="libs/gitbook/js/lunr.js"></script>
<script src="libs/gitbook/js/plugin-search.js"></script>
<script src="libs/gitbook/js/plugin-sharing.js"></script>
<script src="libs/gitbook/js/plugin-fontsettings.js"></script>
<script src="libs/gitbook/js/plugin-bookdown.js"></script>
<script src="libs/gitbook/js/jquery.highlight.js"></script>
<script>
gitbook.require(["gitbook"], function(gitbook) {
gitbook.start({
"sharing": {
"github": true,
"facebook": false,
"twitter": false,
"google": false,
"linkedin": false,
"weibo": false,
"instapper": false,
"vk": false,
"all": ["facebook", "google", "twitter", "linkedin", "weibo", "instapaper"]
},
"fontsettings": {
"theme": "white",
"family": "sans",
"size": 2
},
"edit": {
"link": "https://github.com/jinzhen-lin/cdacode-document/edit/master/04-logistic-regression.Rmd",
"text": "编辑"
},
"download": ["cdacode.pdf", "cdacode.epub", "cdacode.zip"],
"toc": {
"collapse": "section"
}
});
});
</script>

<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    var src = "true";
    if (src === "" || src === "true") src = "https://cdn.bootcss.com/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_CHTML";
    if (location.protocol !== "file:" && /^https?:/.test(src))
      src = src.replace(/^https?:/, '');
    script.src = src;
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>
</body>

</html>
